-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate gene reference files #47
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
j23414
force-pushed
the
generate-gene-reference-files
branch
from
May 8, 2024 16:30
58099ae
to
4102012
Compare
genehack
reviewed
May 8, 2024
j23414
added a commit
to nextstrain/rsv
that referenced
this pull request
May 8, 2024
This is a fixup to an earlier commit: 8cd6a13 This updates the docs to reflect that the script will NOT just throw a warning, but actually error out if the gene is not found in the GenBank file. This was flagged by comment: nextstrain/dengue#47 (comment)
1 task
genehack
approved these changes
May 8, 2024
I was wondering why the CI was taking so long, then remembered that example files gets connected to "phylogenetic/data" Fixed with: 30b1d5a |
https://github.com/nextstrain/rsv/blob/a1788ce2c9c4375fb5a06d1426c64c45cf90225f/scripts/newreference.py fixup: fix comments to match behavior Co-authored-by: John SJ Anderson <[email protected]>
Adds some wildcard constraints on serotype-gene combinations to avoid unchecked wildcard matching, such as having {serotype}.fasta match both "denv1_E.fasta" and "denv1.fasta".
This is in preperation of having separate genome and gene (e.g. E, NS1) reference files.
This is in preperation of nesting each gene's specific files in subdirectories (e.g. `results/E/tree.nwk`) as suggested in comment: * nextstrain/private#102 (comment)
In prep of building "genome" and "E" intermediate and final files for the phylogenetic pipeline.
Move gene annotation to top of CDS to match other genbank files (denv1,3,4)
This generates the reference_serotype_gene.gb and reference_serotype_gene.fasta files for each serotype. These files can then be subsequently used in augur align, augur translate, and optionally for nextclade align during the gene trees.
j23414
force-pushed
the
generate-gene-reference-files
branch
from
May 8, 2024 23:33
30b1d5a
to
f5b7bf6
Compare
2 tasks
j23414
added a commit
to nextstrain/rsv
that referenced
this pull request
Jun 5, 2024
This is a fixup to an earlier commit: 8cd6a13 This updates the docs to reflect that the script will NOT just throw a warning, but actually error out if the gene is not found in the GenBank file. This was flagged by comment: nextstrain/dengue#47 (comment)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
In order to support gene phylogenetic trees (e.g. E gene trees), add rules to automatically generate gene reference GenBank and FASTA files (e.g.
reference_denv4_E.gb
andreference_denv4_E.fasta
) by following the rules used in RSV.This is part of a larger and older issue of creating E gene builds and is being split out into smaller PRs to maintain QC and scope of review. This will not generate an E gene phylogenetic tree, subsequent PRs will modify this to generate the trees.
Visual summary (view whole pipeline plan so far)
Related issue(s)
Checklist
Example shortened reference_denv2_E.gb